Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
1.
Radiol Artif Intell ; 6(2): e240138, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38535965
3.
Nat Med ; 30(4): 1134-1142, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38413730

RESUMO

Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor-patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.


Assuntos
Documentação , Semântica , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Relações Médico-Paciente
4.
Radiol Cardiothorac Imaging ; 6(1): e240046, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38385760
6.
Eur Radiol ; 34(4): 2727-2737, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37775589

RESUMO

OBJECTIVES: There is a need for CT pulmonary angiography (CTPA) lung segmentation models. Clinical translation requires radiological evaluation of model outputs, understanding of limitations, and identification of failure points. This multicentre study aims to develop an accurate CTPA lung segmentation model, with evaluation of outputs in two diverse patient cohorts with pulmonary hypertension (PH) and interstitial lung disease (ILD). METHODS: This retrospective study develops an nnU-Net-based segmentation model using data from two specialist centres (UK and USA). Model was trained (n = 37), tested (n = 12), and clinically evaluated (n = 176) on a diverse 'real-world' cohort of 225 PH patients with volumetric CTPAs. Dice score coefficient (DSC) and normalised surface distance (NSD) were used for testing. Clinical evaluation of outputs was performed by two radiologists who assessed clinical significance of errors. External validation was performed on heterogenous contrast and non-contrast scans from 28 ILD patients. RESULTS: A total of 225 PH and 28 ILD patients with diverse demographic and clinical characteristics were evaluated. Mean accuracy, DSC, and NSD scores were 0.998 (95% CI 0.9976, 0.9989), 0.990 (0.9840, 0.9962), and 0.983 (0.9686, 0.9972) respectively. There were no segmentation failures. On radiological review, 82% and 71% of internal and external cases respectively had no errors. Eighteen percent and 25% respectively had clinically insignificant errors. Peripheral atelectasis and consolidation were common causes for suboptimal segmentation. One external case (0.5%) with patulous oesophagus had a clinically significant error. CONCLUSION: State-of-the-art CTPA lung segmentation model provides accurate outputs with minimal clinical errors on evaluation across two diverse cohorts with PH and ILD. CLINICAL RELEVANCE: Clinical translation of artificial intelligence models requires radiological review and understanding of model limitations. This study develops an externally validated state-of-the-art model with robust radiological review. Intended clinical use is in techniques such as lung volume or parenchymal disease quantification. KEY POINTS: • Accurate, externally validated CT pulmonary angiography (CTPA) lung segmentation model tested in two large heterogeneous clinical cohorts (pulmonary hypertension and interstitial lung disease). • No segmentation failures and robust review of model outputs by radiologists found 1 (0.5%) clinically significant segmentation error. • Intended clinical use of this model is a necessary step in techniques such as lung volume, parenchymal disease quantification, or pulmonary vessel analysis.


Assuntos
Aprendizado Profundo , Hipertensão Pulmonar , Doenças Pulmonares Intersticiais , Humanos , Hipertensão Pulmonar/diagnóstico por imagem , Inteligência Artificial , Estudos Retrospectivos , Tomografia Computadorizada por Raios X , Doenças Pulmonares Intersticiais/diagnóstico por imagem , Pulmão
7.
JAMA Netw Open ; 6(12): e2345892, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38039004

RESUMO

Importance: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. Objective: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. Design, Setting, and Participants: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. Main Outcomes and Measures: Data set experts' perceptions on what makes data sets AI ready. Results: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. Conclusions and Relevance: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.


Assuntos
Inteligência Artificial , Adulto , Feminino , Humanos , Masculino , Atenção à Saúde , Aprendizado de Máquina , Pesquisa Qualitativa
8.
JAMA Netw Open ; 6(12): e2348422, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38113040

RESUMO

Importance: Limited sharing of data sets that accurately represent disease and patient diversity limits the generalizability of artificial intelligence (AI) algorithms in health care. Objective: To explore the factors associated with organizational motivation to share health data for AI development. Design, Setting, and Participants: This qualitative study investigated organizational readiness for sharing health data across the academic, governmental, nonprofit, and private sectors. Using a multiple case studies approach, 27 semistructured interviews were conducted with leaders in data-sharing roles from August 29, 2022, to January 9, 2023. The interviews were conducted in the English language using a video conferencing platform. Using a purposive and nonprobabilistic sampling strategy, 78 individuals across 52 unique organizations were identified. Of these, 35 participants were enrolled. Participant recruitment concluded after 27 interviews, as theoretical saturation was reached and no additional themes emerged. Main Outcome and Measure: Concepts defining organizational readiness for data sharing and the association between data-sharing factors and organizational behavior were mapped through iterative qualitative analysis to establish a framework defining organizational readiness for sharing clinical data for AI development. Results: Interviews included 27 leaders from 18 organizations (academia: 10, government: 7, nonprofit: 8, and private: 2). Organizational readiness for data sharing centered around 2 main constructs: motivation and capabilities. Motivation related to the alignment of an organization's values with data-sharing priorities and was associated with its engagement in data-sharing efforts. However, organizational motivation could be modulated by extrinsic incentives for financial or reputational gains. Organizational capabilities comprised infrastructure, people, expertise, and access to data. Cross-sector collaboration was a key strategy to mitigate barriers to access health data. Conclusions and Relevance: This qualitative study identified sector-specific factors that may affect the data-sharing behaviors of health organizations. External incentives may bolster cross-sector collaborations by helping overcome barriers to accessing health data for AI development. The findings suggest that tailored incentives may boost organizational motivation and facilitate sustainable flow of health data for AI development.


Assuntos
Inteligência Artificial , Atenção à Saúde , Humanos , Setor Privado , Disseminação de Informação , Motivação
9.
Res Sq ; 2023 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-37961377

RESUMO

Sifting through vast textual data and summarizing key information from electronic health records (EHR) imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy on a diverse range of clinical summarization tasks has not yet been rigorously demonstrated. In this work, we apply domain adaptation methods to eight LLMs, spanning six datasets and four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not improve results. Further, in a clinical reader study with ten physicians, we show that summaries from our best-adapted LLMs are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis highlights challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and the inherently human aspects of medicine.

10.
Radiology ; 309(1): e231114, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37874234
11.
Patterns (N Y) ; 4(9): 100802, 2023 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-37720336

RESUMO

Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differences. In this study, we investigate the alignment between automated metrics and radiologists' scoring of errors in report generation. We address the limitations of existing metrics by proposing new metrics, RadGraph F1 and RadCliQ, which demonstrate stronger correlation with radiologists' evaluations. In addition, we analyze the failure modes of the metrics to understand their limitations and provide guidance for metric selection and interpretation. This study establishes RadGraph F1 and RadCliQ as meaningful metrics for guiding future research in radiology report generation.

12.
JAMIA Open ; 6(3): ooad054, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37545984

RESUMO

Objective: To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods: The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results: The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion: Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion: Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.

18.
J Am Med Inform Assoc ; 30(2): 318-328, 2023 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-36416419

RESUMO

OBJECTIVE: To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates "hiding in plain sight." MATERIALS AND METHODS: In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests. RESULTS: Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span. DISCUSSION: Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports. CONCLUSIONS: A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.


Assuntos
Anonimização de Dados , Radiologia , Humanos , Estudos Retrospectivos , Algoritmos , Instalações de Saúde , Processamento de Linguagem Natural
19.
J Digit Imaging ; 36(1): 164-177, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36323915

RESUMO

Building a document-level classifier for COVID-19 on radiology reports could help assist providers in their daily clinical routine, as well as create large numbers of labels for computer vision models. We have developed such a classifier by fine-tuning a BERT-like model initialized from RadBERT, its continuous pre-training on radiology reports that can be used on all radiology-related tasks. RadBERT outperforms all biomedical pre-trainings on this COVID-19 task (P<0.01) and helps our fine-tuned model achieve an 88.9 macro-averaged F1-score, when evaluated on both X-ray and CT reports. To build this model, we rely on a multi-institutional dataset re-sampled and enriched with concurrent lung diseases, helping the model to resist to distribution shifts. In addition, we explore a variety of fine-tuning and hyperparameter optimization techniques that accelerate fine-tuning convergence, stabilize performance, and improve accuracy, especially when data or computational resources are limited. Finally, we provide a set of visualization tools and explainability methods to better understand the performance of the model, and support its practical use in the clinical setting. Our approach offers a ready-to-use COVID-19 classifier and can be applied similarly to other radiology report classification tasks.


Assuntos
COVID-19 , Radiologia , Humanos , Relatório de Pesquisa , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA